{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dynamic Programming\n",
    "\n",
    "We will be using dynamic programming on sample 4x4 grid world and study **Policy Iteration** \n",
    "\n",
    "![GridWorld](./images/gridworld.png \"Grid World\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Policy Iteration (Improvement)\n",
    "\n",
    "Policy improvement is carried out by repeatedly applying policy evaluation and greedy action selection steps in a cycle till there is no further change. Pseudo code for the algorithm is given in Fig 3-4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Running in Colab/Kaggle\n",
    "\n",
    "If you are running this on Colab, please uncomment below cell and run this to install required dependencies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Uncomment and execute this cell to install all the the dependencies if running in Google Colab or Kaggle\n",
    "\n",
    "## Uncomment and run for Colab\n",
    "# !git clone https://github.com/nsanghi/drl-2ed\n",
    "# %cd /content/drl-2ed \n",
    "# %cd chapter3\n",
    "\n",
    "\n",
    "## Uncomment and run for Kaggle\n",
    "# !git clone https://github.com/nsanghi/drl-2ed\n",
    "# %cd /content/drl-2ed \n",
    "# %cd chapter3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initial imports and environment setup\n",
    "import numpy as np\n",
    "import sys\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "sns.set()\n",
    "\n",
    "# create grid world environment\n",
    "from gridworld import GridworldEnv\n",
    "env = GridworldEnv()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Custom print to show state values inside the grid\n",
    "def grid_print(V, k=None):\n",
    "    ax = sns.heatmap(V.reshape(env.shape),\n",
    "                     annot=True, square=True,\n",
    "                     cbar=False, cmap='Blues',\n",
    "                     xticklabels=False, yticklabels=False)\n",
    "\n",
    "    if k:\n",
    "        ax.set(title=\"State values after K = {0}\".format(k))\n",
    "    else:\n",
    "        ax.set(title=\"State Values\".format(k))\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Policy Evaluation\n",
    "\n",
    "def policy_evaluation(policy, env, discount_factor=1.0, theta=0.00001):\n",
    "    \"\"\"\n",
    "    Evaluate a policy given an environment and\n",
    "    a full description of the environment's dynamics.\n",
    "\n",
    "    Args:\n",
    "        policy:[S, A] shaped matrix representing the policy. Random in our case\n",
    "        env: env.P -> transition dynamics of the environment.\n",
    "            env.P[s][a] [(prob, next_state, reward, done)].\n",
    "            env.nS is number of states in the environment.\n",
    "            env.nA is number of actions in the environment.\n",
    "        theta: Stop evaluation once value function change is\n",
    "            less than theta for all states.\n",
    "        discount_factor: Gamma discount factor.\n",
    "\n",
    "    Returns:\n",
    "        Vector of length env.nS representing the value function.\n",
    "    \"\"\"\n",
    "    # Start with a (all 0) value function\n",
    "    V = np.zeros(env.nS)\n",
    "    V_new = np.copy(V)\n",
    "    while True:\n",
    "        delta = 0\n",
    "        # For each state, perform a \"backup\"\n",
    "        for s in range(env.nS):\n",
    "            v = 0\n",
    "            # Look at the possible next actions\n",
    "            for a, pi_a in enumerate(policy[s]):\n",
    "                # For each action, look at the possible next states...\n",
    "                for prob, next_state, reward, done in env.P[s][a]:\n",
    "                    # Calculate the expected value as per backup diagram\n",
    "                    v += pi_a * prob * \\\n",
    "                        (reward + discount_factor * V[next_state])\n",
    "            # How much our value function changed (across any states)\n",
    "            V_new[s] = v\n",
    "            delta = max(delta, np.abs(V_new[s] - V[s]))\n",
    "        V = np.copy(V_new)\n",
    "        # Stop if change is below a threshold\n",
    "        if delta < theta:\n",
    "            break\n",
    "    return np.array(V)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Policy Improvement\n",
    "\n",
    "def policy_improvement(policy, V, env, discount_factor=1.0):\n",
    "    \"\"\"\n",
    "    Improve a policy given an environment and a full description\n",
    "    of the environment's dynamics and the state-values V.\n",
    "\n",
    "    Args:\n",
    "        policy: [S, A] shaped matrix representing the policy.\n",
    "        V: current state-value for the given policy\n",
    "        env: env.P -> transition dynamics of the environment.\n",
    "            env.P[s][a] [(prob, next_state, reward, done)].\n",
    "            env.nS is number of states in the environment.\n",
    "            env.nA is number of actions in the environment.\n",
    "        discount_factor: Gamma discount factor.\n",
    "\n",
    "    Returns:\n",
    "        policy: [S, A] shaped matrix representing improved policy.\n",
    "        policy_changed: boolean which has value of `True` if there\n",
    "                        was a change in policy\n",
    "    \"\"\"\n",
    "\n",
    "    def argmax_a(arr):\n",
    "        \"\"\"\n",
    "        Return idxs of all max values in an array.\n",
    "        \"\"\"\n",
    "        max_idx = []\n",
    "        max_val = float('-inf')\n",
    "        for idx, elem in enumerate(arr):\n",
    "            if elem == max_val:\n",
    "                max_idx.append(idx)\n",
    "            elif elem > max_val:\n",
    "                max_idx = [idx]\n",
    "                max_val = elem\n",
    "        return max_idx\n",
    "\n",
    "    policy_changed = False\n",
    "    Q = np.zeros([env.nS, env.nA])\n",
    "    new_policy = np.zeros([env.nS, env.nA])\n",
    "\n",
    "    # For each state, perform a \"greedy improvement\"\n",
    "    for s in range(env.nS):\n",
    "        old_action = np.array(policy[s])\n",
    "        for a in range(env.nA):\n",
    "            for prob, next_state, reward, done in env.P[s][a]:\n",
    "                # Calculate the expected value as per backup diagram\n",
    "                Q[s, a] += prob * (reward + discount_factor * V[next_state])\n",
    "\n",
    "        # get maximizing actions and set new policy for state s\n",
    "        best_actions = argmax_a(Q[s])\n",
    "        new_policy[s, best_actions] = 1.0 / len(best_actions)\n",
    "\n",
    "    if not np.allclose(new_policy[s], policy[s]):\n",
    "        policy_changed = True\n",
    "\n",
    "    return new_policy, policy_changed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Policy Iteration\n",
    "\n",
    "def policy_iteration(env, discount_factor=1.0, theta=0.00001):\n",
    "\n",
    "    # initialize a random policy\n",
    "    policy = np.ones([env.nS, env.nA]) / env.nA\n",
    "    while True:\n",
    "        V = policy_evaluation(policy, env, discount_factor, theta)\n",
    "        policy, changed = policy_improvement(policy, V, env, discount_factor)\n",
    "        if not changed:  # terminate iteration once no improvement is observed\n",
    "            V_optimal = policy_evaluation(policy, env, discount_factor, theta)\n",
    "            print(\"Optimal Policy\\n\", policy)\n",
    "            return np.array(V_optimal)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimal Policy\n",
      " [[0.25 0.25 0.25 0.25]\n",
      " [0.   0.   0.   1.  ]\n",
      " [0.   0.   0.   1.  ]\n",
      " [0.   0.   0.5  0.5 ]\n",
      " [1.   0.   0.   0.  ]\n",
      " [0.5  0.   0.   0.5 ]\n",
      " [0.   0.   0.5  0.5 ]\n",
      " [0.   0.   1.   0.  ]\n",
      " [1.   0.   0.   0.  ]\n",
      " [0.5  0.5  0.   0.  ]\n",
      " [0.   1.   0.   0.  ]\n",
      " [0.   0.   1.   0.  ]\n",
      " [1.   0.   0.   0.  ]\n",
      " [0.   1.   0.   0.  ]\n",
      " [0.   1.   0.   0.  ]\n",
      " [0.25 0.25 0.25 0.25]]\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAGbCAYAAAAr/4yjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAjPUlEQVR4nO3deXSU9b3H8U9WMiQZQhIMkQCCNAlbEhaRpVDCxZXlSmSxXBqKUIq4UCxYOCpVLz21iEoxVqMgICoulKWERUAtskhL2SoiCBRCgwFCAmQSMslMmPuH9dfODUiUJI88eb/O8ZzM73lIvpPzZN4zzywG+Hw+nwAAkBRo9QAAgO8PogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIA1IKkpCS98MILVo8BfGtEAVft4MGDeuihh5Senq6OHTuqd+/eGjNmjBYvXuy338svv6yNGzd+559z+PBhvfDCC8rLy7vakY2ZM2cqKSlJubm5l93n+eefV1JSkg4cOFBjPxf4viIKuCq7du3S3XffrQMHDmjYsGGaMWOGhg0bpsDAQL3++ut++2ZnZ191FLKysnTixImrHdsYNGiQJGnVqlWX3ScnJ0eJiYlKTk6usZ8LfF8FWz0Arm0vv/yyIiMjtXTpUjmdTr9thYWFFk1VfampqWrZsqVWr16tBx54oMr23bt3Ky8vT7/85S8tmA6oezxSwFU5fvy42rRpUyUIkhQTE2O+TkpK0oULF7R8+XIlJSUpKSlJ06ZNkySdOHFCTzzxhG677TalpKTo5ptv1kMPPeR3mmjZsmWaNGmSJCkzM9N8j7/85S9mn02bNmnkyJFKS0tTp06dNH78eB06dOiK12HQoEH6xz/+oc8++6zKtpycHAUEBGjgwIGqqKjQ73//e2VkZKhLly5KS0vTyJEjtX379iv+jGnTpqlfv35V1l944QUlJSVVWV+5cqUyMjKUkpKibt26afLkycrPz/fb59ixY3rwwQfVq1cvdezYUX369NHkyZPlcrmuOA9wOTxSwFVp1qyZdu/erS+++EKJiYmX3W/WrFl67LHHlJKSouHDh0uSWrRoIUn69NNPtXv3bg0YMEBNmzbViRMntGTJEmVmZmr16tVyOBy66aab9JOf/ESLFy/WhAkT1Lp1a0nSjTfeKElasWKFpk2bph/+8IeaMmWKysrKtGTJEo0cOVLLly9XQkLCZWcbNGiQsrKylJOTo/bt25v1yspKrV27Vl27dtX111+voqIivffeexo4cKCGDRum0tJSLV26VOPGjdN7772ntm3bXvXvU5Jeeukl/f73v9cdd9yhoUOHqqioSG+88Yb+53/+RytWrJDT6VRFRYXGjh2riooKjRo1SrGxsTp16pT+/Oc/q7i4WJGRkTUyC+ohH3AVtmzZ4mvbtq2vbdu2vhEjRvhmzZrl27x5s6+ioqLKvmlpab5f/epXVdbLysqqrO3evduXmJjoW758uVlbu3atLzEx0bd9+3a/fUtKSnxdu3b1PfbYY37rBQUFvi5dulRZv5S7777b16dPH19lZaVZ+/jjj32JiYm+t99+2+fz+Xxer9dXXl7u9+/Onz/v69mzp2/69Ol+64mJib65c+eay7/61a986enpVX7u3LlzfYmJieZyXl6er23btr6XXnrJb7+DBw/62rVrZ9b379/vS0xM9K1du/aK1w34Njh9hKvSq1cvvf322+rXr58OHDigefPmaezYserTp48++OCDan2PsLAw87XH49HZs2fVokULOZ1O7d+//4r/ftu2bSouLtaAAQNUVFRk/gsMDFRqaqrfKabLGTx4sE6ePKkdO3aYtZycHIWEhOj222+XJAUFBSk0NFSSdPHiRZ07d05er1cdOnSo1pzVsWHDBl28eFF33HGH33WJjY1Vy5YtzXWJiIiQJG3ZskVlZWU18rMBidNHqAEpKSnKyspSRUWFDhw4oI0bN2rhwoWaNGmSVqxYoTZt2nzjv3e73crOztayZct06tQp+f7jfwZYnfPjx44dkySNHj36ktu/vgH9JgMGDNDTTz+tnJwc3XzzzSovL9eGDRvUp08fNWrUyOy3fPlyvfbaazp69Kg8Ho9Z/6bTU9/GsWPH5PP5dOutt15ye3DwV3+yzZs315gxY7RgwQKtWrVKXbt2Vb9+/TR48GBOHeGqEAXUmNDQUKWkpCglJUU33HCDpk+frnXr1l3yVT3/6X//93+1bNkyjR49WmlpaYqMjFRAQIAmT57sF4jL+XqfWbNmqUmTJlW2BwUFXfF7xMTEqGfPnlq/fr1mzJihDz/8UKWlpeYlq9JXT/5OmzZN/fv319ixYxUTE6OgoCBlZ2frn//85zd+/4CAgEuuV1ZW+l2+ePGiAgIC9Oqrr15y7oYNG5qvp02bpiFDhuiDDz7Q1q1bNXPmTGVnZ+vdd99V06ZNr3idgUshCqgVHTp0kCSdPn36ivu+//77uuuuu8yrkSSpvLy8yqOEy92wNm/eXNK/b9i/q0GDBmnz5s36+OOPlZOTo4iICL9XDL3//vtq3ry5srKy/GaZO3fuFb+30+lUcXFxlfUvv/zS73KLFi3k8/mUkJCgVq1aXfH7fv0qrIkTJ2rXrl368Y9/rCVLlmjy5MlX/LfApfCcAq7K9u3bL3lvftOmTZJkXiUkfXUv91I3jJe6R7x48eIq96IdDoekqqeUevfurYiICGVnZ/ud0vlaUVFRNa6J1L9/fzkcDr311lv6+OOPdeutt6pBgwZV5vzP67t3717t2bPnit+7RYsWcrlcfu+KPn36tDZs2OC336233qqgoCBlZWVV+b36fD6dPXtWklRSUiKv1+u3PTExUYGBgaqoqKjW9QUuhUcKuCozZ85UWVmZbrnlFrVu3Voej0e7du3S2rVr1axZM2VkZJh927dvr08++UQLFizQddddp4SEBKWmpqpv375auXKlIiIi1KZNG+3Zs0fbtm1TVFSU389q27atgoKC9Oqrr8rlcik0NFTdu3dXTEyMnnjiCT3yyCPKyMjQnXfeqejoaH355ZfatGmTOnfurBkzZlzxuoSHh+u//uu/lJOTI0l+p44kqW/fvlq/fr3uv/9+9e3bV3l5eXr77bfVpk0bXbhw4Ru/95133qnZs2frgQce0E9+8hO53W4tWbJErVq18nt/RIsWLfSLX/xCzz77rE6cOKH+/fsrPDxceXl52rhxo4YPH66xY8dq+/bteuqpp3T77bfrhhtuUGVlpVauXKmgoCDddtttV7yuwOUQBVyVRx55ROvWrdOmTZv0zjvvyOPx6Prrr9fIkSN13333+b2pbdq0aZoxY4bmzJkjt9utIUOGKDU1VY8++qgCAwO1atUqlZeXq3PnzlqwYIHGjRvn97OaNGmiJ598UtnZ2Xr00UdVWVmp119/XTExMRo0aJCuu+46vfLKK5o/f74qKioUFxenrl27+oXpSgYPHqycnBw1adJE3bt399uWkZGhM2fO6J133tGWLVvUpk0bPfPMM1q3bp3++te/fuP3bdy4sbKysvT000/rmWeeUUJCgh5++GHl5uZWedPc+PHjdcMNN2jhwoV68cUXJUlNmzZVr169zOmspKQk/fCHP9RHH32kU6dOyeFwKCkpSa+++qrS0tKqfX2B/y/AV51n8gAA9QLPKQAADKIAADCIAgDAIAoAAIMoAAAMogAAMKr9PgVHp2/+/BrUrNQRw6weod6Zcsfl/38QqHkD28dbPUK9E1aNW3weKQAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDACLZ6gNoQGhKsGfcN0MiB3RQV6dC+Q1/qiRdz9OFfDlg9mq3FhIdqRNdman+9U8lNIxTeIFgT39qjXcfPWz2aLR35dKf2bN6o3IOfqriwQBFR0WrdoZP6jxgrZ+MYq8ezpZ1/26FFC+brwIHPdbaoSJGRTiUlJ2v8hInq1LmL1ePVCFs+Unj1qVF6aFQ/vb1mh6Y880dVXryoFS/cp55pra0ezdZaxjiU2aOFmkSG6khBqdXj2N66N7N1dP8etbuptwaMeVApPftp3yd/1ouPjJPrXKHV49lS7rFjCgwM1LDh92j6YzM0esy9KjxzRveOHqWtmz+2erwaEeDz+XzV2dHR6YHanqVGdG3fUpvfmKrpzy3XnMUfSJIahAZr53uPquCsS+k/fc7iCasndcQwq0f41hqGBik4MEDFbq/Sk2L12yHtr6lHClPuSLR6hG/l6P69apncUYGBgX5r856YpL4Zo3TLPeMsnO7KBraPt3qEGlFWVqYBt/VXUnKyXnplvtXjfKOwapwbst0jhSH90+T1Vmr+sq1mrbzCq4UrP1H31NZKiIuybjibu1BRqWK31+ox6o1W7VL9gvD1miPCqYK84xZNVf84HA41jo6Wy+WyepQaYbsopCY316Hjp+Uqdfut/23fMUlSSlKCBVMBdaPcfUEV7jI1dDayehRbKykp0dmzRTr6jyOaO+c5HT70hW7u3sPqsWqE7Z5obhrr1MmC4irrJ898tRbfhD8W2Ne21UtV6fWoY890q0extakPT9K2rVskSSEhIRo6fITGT5ho8VQ1w3ZRcDQIUbmn6ikMd7nHbAfs6Oj+vfpw6SJ17JGuGzt0tnocW5s0eYoyf3qvTp7M16qVK+TxeFTp9UoNGlg92lWzXRTKyj1qEFL1aoX9KwZl/4oDvrvgwAA5Hf6/43MXPLpYrZcs4Nvyej0qK/F/9BvujFJgYJC5XHAiV2/OflxxzVtpyISpdT2i7XgqKnT+vP8LJBpHRyso6KvfeXLbtmZ94MDBGjEsQ48/Ol3Pzplbp3PWBttF4eSZYl1/XdVTRE1jnZKk/IJr45Uw32cpCU79YWSa39qQl7Yr/3y5NQPZ3PGD+zT/ycl+a1OylqjxdV+9eufcmdNaMHOqwhqGK3P679TA0dCKMW1lz57dGjcm029tzfoP1KxZ1eckQ0JD1Te9n16b94rcbrfCwsLqasxaYbso/P1gnn7U9QeKDA/ze7L5pg43mO24OodOlerBJXv91gpLKiyaxv7iW7bRmMdm+61FREVLki64zmvhb6bI6/Vo/IzneNNaDUlKSlb2vAV+a7GxTS67f7nbLZ/Pp9LSUqLwfbN8425NHt1fYzN6mfcphIYEK/O/u+uvfz+qvFPnrB3QBlzlXu3IPWf1GPWGIyJSbVK6VlmvcJdp0W+nqbjojMb++nnFxvPKupribNRI3Xv0rLJeWFiomBj/8BYXF2vjhvVq2jS+yrZrke2isGNfrv64fpeeenCwmkRH6Mg/z2jUoG5qGR+jCU++afV4tjemZwtJUqvYcEnS7e3jlJrw1em8Bdt47XxNenfuTOUd/lxd0u9UQV6uCvJyzbbQMIfadett4XT2dP+EnykuLk4dU1IVHR2j/PwvtXLFMhWcPq1Zs5+3erwaYbsoSNLYx1/XrycO1I8HdFNjZ0PtO3RCGZNe1tZdR6wezfZ+3qeV3+XBqf9+1ypRqFn5uYclSTs/WqOdH63x2xbVJI4o1IK7htytdWtX643XF8rlcinS6VRKSqqenvWsOnep+mjuWmS7j7mwi2vxYy6uddfax1xc6+zyMRfXknr5MRcAgO+OKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCCq7tj6ohhtTkH/p8pdyRaPUK9M7B9vNUj1Cs5n+VbPUK9MzT1ysc4jxQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYwVYPUJtiwkM1omsztb/eqeSmEQpvEKyJb+3RruPnrR7Nlo58ulN7Nm9U7sFPVVxYoIioaLXu0En9R4yVs3GM1ePZzs6/7dCiBfN14MDnOltUpMhIp5KSkzV+wkR16tzF6vFsqT4c47aOQssYhzJ7tNDxogs6UlCqlIRGVo9ka+vezFZZiUsduvdVTHwznT2Vr+3vL9fBnZ/ogWfmKTLKHn803xe5x44pMDBQw4bfo5jYWLmKi7V61Z907+hRyvpDtnr17mP1iLZTH45xW0fhwMkS3Tpnq4rdXqUnxRKFWnZn5v1qmdxRgYH/Piv5g7RumvfEJG1ft1y33DPOwunsJ2PoMGUMHea3NvyekRpwW3+9sXgRUagF9eEYt3UULlRUWj1CvdKqXeol1xwRThXkHbdgovrH4XCocXS0XC6X1aPYUn04xm0dBViv3H1BFe4yNXTyKK22lJSUyOOp0LmzZ7XqTyt1+NAXGjd+gtVj1Rt2O8aJAmrVttVLVen1qGPPdKtHsa2pD0/Stq1bJEkhISEaOnyExk+YaPFU9YfdjnGigFpzdP9efbh0kTr2SNeNHTpbPY5tTZo8RZk/vVcnT+Zr1coV8ng8qvR6pQYNrB7N9ux4jNsiCsGBAXI6/K/KuQseXfRZNJDNeb0elZUU+62FO6MUGBhkLhecyNWbsx9XXPNWGjJhal2PaCueigqdP+//MurG0dEKCvrq953ctq1ZHzhwsEYMy9Djj07Xs3Pm1umcdlKfj3FbRCElwak/jEzzWxvy0nblny+3ZiCbO35wn+Y/OdlvbUrWEjW+Ll6SdO7MaS2YOVVhDcOVOf13auBoaMWYtrFnz26NG5Ppt7Zm/Qdq1iyhyr4hoaHqm95Pr817RW63W2FhYXU1pq3U52PcFlE4dKpUDy7Z67dWWFJh0TT2F9+yjcY8NttvLSIqWpJ0wXVeC38zRV6vR+NnPGebN/RYKSkpWdnzFvitxcY2uez+5W63fD6fSktLicJ3VJ+PcVtEwVXu1Y7cc1aPUW84IiLVJqVrlfUKd5kW/XaaiovOaOyvn1dsfNV7svj2nI0aqXuPnlXWCwsLFRPjf4NUXFysjRvWq2nT+CrbUH31+Ri3RRS+yZieLSRJrWLDJUm3t49T6r/exLZgmz1eV/x98e7cmco7/Lm6pN+pgrxcFeTlmm2hYQ6169bbwuns5/4JP1NcXJw6pqQqOjpG+flfauWKZSo4fVqzZj9v9Xi2VB+OcdtH4ed9WvldHpwab74mCjUrP/ewJGnnR2u086M1ftuimsTZ4g/m++SuIXdr3drVeuP1hXK5XIp0OpWSkqqnZz2rzl2q3svF1asPx3iAz+er1mt0uj+9qbZnwX+Yckei1SPUOwPbx195J9SYnM/yrR6h3hmaeuVjnI/OBgAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIARXN0dp9yRWJtz4P8Z2D7e6hHqnZzP8q0eoV6ZvfYLq0eod4amXvl2hUcKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAAjGCrB6hNRz7dqT2bNyr34KcqLixQRFS0WnfopP4jxsrZOMbq8Wxn5992aNGC+Tpw4HOdLSpSZKRTScnJGj9hojp17mL1eLbD8V33YsJDNaJrM7W/3qnkphEKbxCsiW/t0a7j560ercbYOgrr3sxWWYlLHbr3VUx8M509la/t7y/XwZ2f6IFn5ikyij+cmpR77JgCAwM1bPg9iomNlau4WKtX/Un3jh6lrD9kq1fvPlaPaCsc33WvZYxDmT1a6HjRBR0pKFVKQiOrR6pxto7CnZn3q2VyRwUG/vss2Q/SumneE5O0fd1y3XLPOAuns5+MocOUMXSY39rwe0ZqwG399cbiRUShhnF8170DJ0t065ytKnZ7lZ4USxSuNa3apV5yzRHhVEHecQsmqn8cDocaR0fL5XJZPYrtcHzXvQsVlVaPUOtsHYVLKXdfUIW7TA2d9iv890VJSYk8ngqdO3tWq/60UocPfaFx4ydYPVa9wPGNq1XvorBt9VJVej3q2DPd6lFsa+rDk7Rt6xZJUkhIiIYOH6HxEyZaPFX9wPGNq1WvonB0/159uHSROvZI140dOls9jm1NmjxFmT+9VydP5mvVyhXyeDyq9HqlBg2sHs3WOL5RE2wRBa/Xo7KSYr+1cGeUAgODzOWCE7l6c/bjimveSkMmTK3rEW3FU1Gh8+f9X4LXODpaQUFf/b6T27Y16wMHDtaIYRl6/NHpenbO3Dqd0y44vutecGCAnA7/m8dzFzy66LNooDpkiygcP7hP85+c7Lc2JWuJGl8XL0k6d+a0FsycqrCG4cqc/js1cDS0Ykzb2LNnt8aNyfRbW7P+AzVrllBl35DQUPVN76fX5r0it9utsLCwuhrTNji+615KglN/GJnmtzbkpe3KP19uzUB1yBZRiG/ZRmMem+23FhEVLUm64Dqvhb+ZIq/Xo/EznuNNPTUgKSlZ2fMW+K3Fxja57P7lbrd8Pp9KS0uJwnfA8V33Dp0q1YNL9vqtFZZUWDRN3bJFFBwRkWqT0rXKeoW7TIt+O03FRWc09tfPKza+6j1ZfHvORo3UvUfPKuuFhYWKifG/USouLtbGDevVtGl8lW2oHo7vuucq92pH7jmrx7CELaJwOe/Onam8w5+rS/qdKsjLVUFertkWGuZQu269LZzOfu6f8DPFxcWpY0qqoqNjlJ//pVauWKaC06c1a/bzVo9nOxzf1hjTs4UkqVVsuCTp9vZxSv3Xm9gWbLv23x9i6yjk5x6WJO38aI12frTGb1tUkzj+aGrYXUPu1rq1q/XG6wvlcrkU6XQqJSVVT896Vp27VL2ni6vD8W2Nn/dp5Xd5cGq8+doOUQjw+XzVej596d782p4F/2Fg+/gr74QalfMZx3hdmr32C6tHqHe2T/vRFffho7MBAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIAR4PP5fNXZ0e2t7VHwn3I+y7d6hHpn9tovrB6hXtn7zntWj1DvlO3OuuI+PFIAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgEAUAgEEUAAAGUQAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABhEAQBgBFs9QG3a+bcdWrRgvg4c+Fxni4oUGelUUnKyxk+YqE6du1g9nu0c+XSn9mzeqNyDn6q4sEARUdFq3aGT+o8YK2fjGKvHs52Y8FCN6NpM7a93KrlphMIbBGviW3u06/h5q0ezndCQYM24b4BGDuymqEiH9h36Uk+8mKMP/3LA6tFqnK0fKeQeO6bAwEANG36Ppj82Q6PH3KvCM2d07+hR2rr5Y6vHs511b2br6P49andTbw0Y86BSevbTvk/+rBcfGSfXuUKrx7OdljEOZfZooSaRoTpSUGr1OLb26lOj9NCofnp7zQ5NeeaPqrx4USteuE8901pbPVqNC/D5fL7q7Oj21vYodaOsrEwDbuuvpORkvfTKfKvHuaycz/KtHuFbO7p/r1omd1RgYKDf2rwnJqlvxijdcs84C6e7stlrv7B6hG+lYWiQggMDVOz2Kj0pVr8d0v6aeqSw9533rB6hWrq2b6nNb0zV9OeWa87iDyRJDUKDtfO9R1Vw1qX0nz5n8YTVV7Y764r72PqRwqU4HA41jo6Wy+WyehTbadUu1S8IX685IpwqyDtu0VT2daGiUsV2ubf2PTakf5q83krNX7bVrJVXeLVw5SfqntpaCXFR1g1XC+pFFEpKSnT2bJGO/uOI5s55TocPfaGbu/eweqx6odx9QRXuMjV0NrJ6FOA7SU1urkPHT8tV6vZb/9u+Y5KklKQEC6aqPbZ+ovlrUx+epG1bt0iSQkJCNHT4CI2fMNHiqeqHbauXqtLrUcee6VaPAnwnTWOdOllQXGX95Jmv1uKb2OsOT72IwqTJU5T503t18mS+Vq1cIY/Ho0qvV2rQwOrRbO3o/r36cOkideyRrhs7dLZ6HOA7cTQIUbmn6mk6d7nHbLcTW0TBU1Gh8+f9n1xrHB2toKAgSVJy27ZmfeDAwRoxLEOPPzpdz86ZW6dz2oXX61FZif89p3BnlAIDg8zlghO5enP244pr3kpDJkyt6xFtJTgwQE6H/5/quQseXazWS0RwtcrKPWoQUvWmMuxfMSj7VxzswhZR2LNnt8aNyfRbW7P+AzVrVvVcX0hoqPqm99Nr816R2+1WWFhYXY1pG8cP7tP8Jyf7rU3JWqLG18VLks6dOa0FM6cqrGG4Mqf/Tg0cDa0Y0zZSEpz6w8g0v7UhL21X/vlyawaqZ06eKdb111U9RdQ01ilJyi+4Nl7tVV22iEJSUrKy5y3wW4uNbXLZ/cvdbvl8PpWWlhKF7yC+ZRuNeWy231pEVLQk6YLrvBb+Zoq8Xo/Gz3iON63VgEOnSvXgkr1+a4UlFRZNU//8/WCeftT1B4oMD/N7svmmDjeY7XZiiyg4GzVS9x49q6wXFhYqJsb/Rqm4uFgbN6xX06bxVbahehwRkWqT0rXKeoW7TIt+O03FRWc09tfPKzbeXq/KsIqr3KsdueesHqPeWr5xtyaP7q+xGb3M+xRCQ4KV+d/d9de/H1XeqXPWDljDbBGFy7l/ws8UFxenjimpio6OUX7+l1q5YpkKTp/WrNnPWz2e7bw7d6byDn+uLul3qiAvVwV5uWZbaJhD7br1tnA6exrTs4UkqVVsuCTp9vZxSk346lTHgm28N6Qm7NiXqz+u36WnHhysJtEROvLPMxo1qJtaxsdowpNvWj1ejbN1FO4acrfWrV2tN15fKJfLpUinUykpqXp61rPq3KXqPV1cnfzcw5KknR+t0c6P1vhti2oSRxRqwc/7tPK7PDg13nxNFGrO2Mdf168nDtSPB3RTY2dD7Tt0QhmTXtbWXUesHq3G1buPubhWXIsfc3Gtu9Y+5uJad618zIWd8DEXAIBvhSgAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwiAIAwCAKAACDKAAADKIAADCIAgDAIAoAAIMoAAAMogAAMIgCAMAgCgAAgygAAAyiAAAwAnw+n8/qIQAA3w88UgAAGEQBAGAQBQCAQRQAAAZRAAAYRAEAYBAFAIBBFAAABlEAABj/B/YEANIUIjtgAAAAAElFTkSuQmCC",
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Run policy iteration on Grid world\n",
    "V_star = policy_iteration(env)\n",
    "\n",
    "# Print optimal policy state values\n",
    "grid_print(V_star.reshape(env.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "We see that optimal state values are negative of number of steps required to reach the closest terminal state. As the reward is -1 for each time step till agent reaches terminal state, the optimal policy would take the agent to terminal state in minimal number of possible steps. For some states, more than one action could lead to same number of steps to reach terminal state. For example, look at top right state with value -3, it takes 3 steps to reach the terminal state at top-left or terminal state at bottom-right. In other words, the state values is negative of Manhattan distance between the state and nearest terminal state."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
